Mining Big Data Streams: The Fallacy of Blind Correlation and the Importance of Models

نویسنده

  • Hussein A. Abbass
چکیده

Big data streams mark a new era in artificial intelligence and the data mining literature. Video and voice streams have grown rapidly in recent years. A single lab–based human–computer interaction experiment with one human subject collecting Cognitive, Physiological, and other data can easily generate a few terabytes of data in a single hour; growing rapidly to a Petabyte within a timeframe less than a month. In an article in the Wired Magazine, 2008, by Chris Anderson, he wrote “the data deluge makes the scientific method obsolete”. He predicted that in the age of Petabyte and beyond, a meaningful correlation analysis is enough! Chris comment was provocative; but some started believing it. So was Chris right or wrong? Why? What can we do to face the outburst of big data? Do we have the data mining tools to manage these data? Where is the future of data mining heading? In this talk, I will discuss the above questions and demonstrate some answers using examples of my work and analysis. Copyright c ©2011, Australian Computer Society, Inc. This paper appeared at the 9th Australasian Data Mining Conference (AusDM 2011), Ballarat, Australia. Conferences in Research and Practice in Information Technology (CRPIT), Vol. 121, Peter Vamplew, Andrew Stranieri, Kok–Leong Ong, Peter Christen and Paul Kennedy, Ed. Reproduction for academic, not-for profit purposes permitted provided this text is included. Proceedings of the 9-th Australasian Data Mining Conference (AusDM'11), Ballarat, Australia

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Big Data Quality: From Content to Context

Over the last 20 years, and particularly with the advent of Big Data and analytics, the research area around Data and Information Quality (DIQ) is still a fast growing research area. There are many views and streams in DIQ research, generally aiming at improving the effectiveness of decision making in organizations. Although there are a lot of researches aimed at clarifying the role of BIG data...

متن کامل

Mining Frequent Patterns in Uncertain and Relational Data Streams using the Landmark Windows

Todays, in many modern applications, we search for frequent and repeating patterns in the analyzed data sets. In this search, we look for patterns that frequently appear in data set and mark them as frequent patterns to enable users to make decisions based on these discoveries. Most algorithms presented in the context of data stream mining and frequent pattern detection, work either on uncertai...

متن کامل

Estimation of parameters of metal-oxide surge arrester models using Big Bang-Big Crunch and Hybrid Big Bang-Big Crunch algorithms

Metal oxide surge arrester accurate modeling and its parameter identification are very important for insulation coordination studies, arrester allocation and system reliability. Since quality and reliability of lightning performance studies can be improved with the more efficient representation of the arresters´ dynamic behavior. In this paper, Big Bang – Big Crunch and Hybrid Big Bang – Big Cr...

متن کامل

Forecasting Gold Price using Data Mining Techniques by Considering New Factors

Gold price forecast is of great importance. Many models were presented by researchers to forecast gold price. It seems that although different models could forecast gold price under different conditions, the new factors affecting gold price forecast have a significant importance and effect on the increase of forecast accuracy. In this paper, different factors were studied in comparison to the p...

متن کامل

Multi-Objective Model for Fair Pricing of Electricity Using the Parameters from the Iran Electricity Market Big Data Analysis

Assessment of the electricity market shows that, electricity market data can be considered "big data". this data has been analyzed by both conventional and modern data mining methods. The predicted variables of supply and demand are considered to be the input of a defined multi-objective for predicting electricity price, which is the result of the defined model. This shows the advantage of appl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011